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We address the problem of improving cache predictability and performance in embedded 
systems through the use of software-assisted replacement mechanisms. These mechanisms 
require additional software controlled state information that affects the cache replacement 
decision. Software instructions allow a program to kill a particular cache element, i.e., 
effectively make the element the least recently used element, or keep that cache element, 
i.e., the element will never be evicted. We prove basic th ... 
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Antonio Gonzalez, Mateo Valero, Nigel Topham, Joan M. Parcerisa 
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support for programming languages and operating systems, volume 33 , 32 
Issue 11 , 5 

Full text available- 1B pdfd.50 MB) Additional Information: full citation, abstract, references, citings, index 
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Memory references exhibit locality and are therefore not uniformly distributed across the 
sets of a cache. This skew reduces the effectiveness of a cache because it results in the 
caching of a considerable number of less-recently-used lines which are less likely to be re- 
referenced before they are replaced. In this paper, we describe a technique that dynamically 
identifies these less-recently-used lines and effectively utilizes the cache frames they occupy 
to more accurately approximate the glob ... 

7 LRU-based column-associative caches 
Byung-Kwon Chung, Jih-Kwon Peir 

May 1998 ACM SIGARCH Computer Architecture News, volume 26 issue 2 

Full text available: ||| pdf(859.94 KB) Additional Information: full citation , abstract , citings , index terms 

The column-associative cache is a direct-mapped cache that may be accessed more than 
once, each time with a different hash function, to satisfy a memory request. In the column- 
associative cache, the possible locations that a line can reside in defines the column. The 
original scheme relies on a rehash bit array to guide the replacement policy within the 
column. A subsequent proposal uses an index vector to speed up the cache search within 
the column. In this paper, we consider the idea of using ... 

8 Cache replacement with dynamic exclusion 
Scott McFarling 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th 

annual international symposium on Computer architecture, volume 20 issue 2 

Full text available: 1 Bpdfl894.10 KB) Additional Information: full citation , abstract, references, citings, index 

terms 

Most recent cache designs use direct-mapped caches to provide the fast access time 
required by modern high speed CPU's. Unfortunately, direct-mapped caches have higher 
miss rates than set-associative caches, largely because direct-mapped caches are more 
sensitive to conflicts between items needed frequently in the same phase of program 
execution. This paper presents a new technique for reducing direct-mapped cache misses 
caused by conflicts for a particular cache line. A small fi ... 

9 Set-associative cache simulation using generalized binomial trees 
Rabin A. Sugumar, Santosh G. Abraham 

February 1995 ACM Transactions on Computer Systems (TOCS), Volume 13 issue 1 
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Set-associative caches are widely used in CPU memory hierarchies, I/O subsystems, and file 
systems to reduce average access times. This article proposes an efficient simulation 
technique for simulating a group of set-associative caches in a single pass through the 
address trace, where all caches have the same line size but varying associativities and 
varying number of sets. The article also introduces a generalization of the ordinary binomial 
tree and presents a representation of caches in ... 

Keywords: all-associativity simulation, binomial tree, cache modeling, inclusion properties, 
set-associative caches, single-pass simulation, trace-driven simulation 



10 CAT — caching address tags: a technique for reducing area cost of on-chip caches j 
Hong Wang, Tong Sun, Qing Yang 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 

Full text available: 1Ppdff1.36 MB) Additional Information: full citation , abstract, references , citings, index 
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This paper presents a technique for minimizing chip-area cost of implementing an on-chip 
cache memory of microprocessors. The main idea of the technique Caching Address Tags, or 
CAT cache for short. The CAT cache exploits locality property that exists among addresses of 
memory references for the purpose of minimizing chip area-cost of address tags. By keeping 
only a limited number of distinct tags of cached data rather than having as many tags as 
cache lines, the CAT ... 

11 A new cache replacement scheme based on backpropagation neural networks j 
Humayun Khalid 

March 1997 ACM SIGARCH Computer Architecture News, Volume 25 issue l 
Full text available: ^ pdf(572.97 KB) Additional Information: full citation , abstract , index terms 

In this paper, we present a new neural network-based algorithm, KORA (Khalid ShadOw 
Replacement /Algorithm), that uses backpropagation neural network (BPNN) for the purpose 
of guiding the line/block replacement decisions in cache. This work is a continuation of our 
previous research presented in [l]-[3]. The KORA algorithm attempts to approximate the 
replacement decisions made by the optimal scheme (OPT). The key to our algorithm is to 
identify and subsequently ... 

Keywords: cache memory, neural networks, performance evaluation 



12 Performance of the KORA-2 cache replacement scheme 
Humayun Khalid 

September 1997 ACM SIGARCH Computer Architecture News, volume 25 issue 4 
Full text available: g pdf(630.87 KB) Additional Information: full citation , abstract , index terms 

In this paper, we propose a new strategy (KORA-2) for the replacement of lines in cache 
memories. The algorithm is efficient and easily implementable. It is basically an extension of 
our previous work presented in [l]-[5]. Key to our algorithm is to identify and discard 
inactive lines relatively quickly as opposed to the conventional replacement algorithms. 
Trace-driven simulations were performed for 42 different cache configurations using 
benchmark programs from SPEC92 (Standard performance Eva ... 

Keywords: cache memory, neural networks, performance evaluation 
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controlled caches 

Derek Chiou, Prabhat Jain, Larry Rudolph, Srinivas Devadas 

June 2000 Proceedings of the 37th conference on Design automation 

c ii * ~* i ui 0i ^x/7 C on i/o\ Additional Information: full citation , abstract , references , citings , index 

Full text available: Tq pdf(76.30 KB) 3 

terms 

We propose a way to improve the performance of embedded processors running data- 
intensive applications by allowing software to allocate on-chip memory on an application- 
specific basis. On-chip memory in the form of cache can be made to act like scratch-pad 
memory via a novel hardware mechanism, which we call column caching. Column caching 
enables dynamic cache partitioning in software, by mapping data regions to a specified sets 
of cache "columns" or "ways ... 

14 An integrated memory management scheme for dynamic alias resolution 
Tzi-cker Chiueh 

August 1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing 

Full text available: fiQ pdf(944.90 KB) Additional Information: full citation , references , citings , index terms 
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Brad Calder, Dirk Grunwald 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 
Full text available* HE df(1 25 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Accurate instruction fetch and branch prediction is increasingly important on today's wide- 
issue architectures. Fetch prediction is the process of determining the next instruction to 
request from the memory subsystem. Branch prediction is the process of predicting the 
likely out-come of branch instructions. Several researchers have proposed very effective 
fetch and branch prediction mechanisms including branch target buffers (BTB) that store the 
target addresses of taken branches. An alternative ... 

16 Efficient simulation of caches under optimal replacement with applications to miss 
characterization 

Rabin A. Sugumar, Santosh G. Abraham 

June 1993 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1993 ACM SIGMETRICS conference on Measurement and modeling of 
computer systems, Volume 21 issue 1 

Full text available: fl| pdf(1.26 MB) Additional Information: full citation , references , citings , index terms 



17 Architecture & distributed systems: Performance evaluation of cache replacement 
policies for the SPEC CPU2000 benchmark suite 
Hussein Al-Zoubi, Aleksandar Milenkovic, Milena Milenkovic 
April 2004 Proceedings of the 42nd annual Southeast regional conference 

Full text available: |||pdf(251.03 KB) Additional Information: full citation , abstract , references 

Replacement policy, one of the key factors determining the effectiveness of a cache, 
becomes even more important with latest technological trends toward highly associative 
caches. The state-of-the-art processors employ various policies such as Random, Least 
Recently Used (LRU), Round-Robin, and PLRU (Pseudo LRU), indicating that there is no 
common wisdom about the best one. Optimal yet unattainable policy would replace cache 
memory block whose next reference is the farthest away in the future, a ... 
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18 Runtime identification of cache conflict misses: The adaptive miss buffer 
Jamison D. Collins, Dean M. Tullsen 

November 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 4 
Full text available: ^ g|pdf(1.08 MB) Additional Information: full citation , abstract , references , index terms 

This paper describes the miss classification table, a simple mechanism that enables the 
processor or memory controller to identify each cache miss as either a conflict miss or a 
capacity (non-conflict) miss. The miss classification table works by storing part of the tag of 
the most recently evicted line of a cache set. If the next miss to that cache set has a 
matching tag, it is identified as a conflict miss. This technique correctly identifies 88&percnt; 
of misses. Several applications of this i ... 

Keywords: Cache architecture, adaptive miss buffer, cache exclusion, conflict misses, 
prefetching, victim cache 



19 The difference-bit cache 

Toni Juan, Tomas Lang, Juan J. Navarro 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, Volume 24 issue 2 

Full text available- ^pdf(798.80 KB) Additional ^formation: full citation, abstract, references , citings, index 
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The difference-bit cache is a two-way set-associative cache with an access time that is 
smaller than that of a conventional one and close or equal to that of a direct-mapped cache. 
This is achieved by noticing that the two tags for a set have to differ at least by one bit and 
by using this bit to select the way. In contrast with previous approaches that predict the 
way and have two types of hits (primary of one cycle and secondary of two to four cycles), 
all hits of the difference-bit cache are ... 

20 Using virtual lines to enhance locality exploitation 
O. Temam, Y. Jegou 

July 1994 Proceedings of the 8th international conference on Supercomputing 

Full text available- 1Sil odfd 15 MB) Additional Information: full citation , abstract , references , citings , index 
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Because the spatial locality of numerical codes is significant, the potential for performance 
improvements is important. However, large cache lines cannot be used in current on-chip 
data caches because of the important pollution they breed. In this paper, we propose a 
hardware design, called the Virtual Line Scheme, that allows the utilization of large virtual 
cache lines when fetching data from memory for better exploitation of spatial locality, while 
the ... 

Keywords: cache architecture, memory hierarchy, numerical codes, spatial locality, 
temporal locality 
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We address the problem of improving cache predictability and performance in embedded 
systems through the use of software-assisted replacement mechanisms. These mechanisms 
require additional software controlled state information that affects the cache replacement 
decision. Software instructions allow a program to kill a particular cache element, i.e., 
effectively make the element the least recently used element, or keep that cache element, 
i.e., the element will never be evicted. We prove basic th ... 

Reducing power in superscalar processor caches using subbanking, multiple line 
buffers and bit-line segmentation 
Kanad Ghose, Milind B. Kamble 

August 1999 Proceedings of the 1999 international symposium on Low power 
electronics and design 
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Memory-references exhibit locality and are therefore not uniformly distributed across the 
sets of a cache. This skew reduces the effectiveness of a cache because It results in the 
caching of a considerable number of less-recently-used lines which are less likely to be re- 
referenced before they are replaced. In this paper, we describe a technique that dynamically 
identifies these less-recently-used lines and effectively utilizes the cache frames they occupy 
to more accurately approximate the glob ... 

Architecture & distributed systems: Performance evaluation of cache replacement 

policies for the SPEC CPU200Q benchmark suite 

Hussein Al-Zoubi, Aleksandar Milenkovic, Milena Milenkovic 

April 2004 Proceedings of the 42nd annual Southeast regional conference 

Full text available: ^ pdf(251 .03 KB) Additional Information: full citation , abstract , references 

Replacement policy, one of the key factors determining the effectiveness of a cache, 
becomes even more important with latest technological trends toward highly associative 
caches. The state-of-the-art processors employ various policies such as Random, Least 
Recently Used (LRU), Round-Robin, and PLRU (Pseudo LRU), indicating that there is no 
common wisdom about the best one. Optimal yet unattainable policy would replace cache 
memory block whose next reference is the farthest away in the future, a ... 
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Most recent cache designs use direct-mapped caches to provide the fast access time 
required by modern high speed CPU's. Unfortunately, direct-mapped caches have higher 
miss rates than set-associative caches, largely because direct-mapped caches are more 
sensitive to conflicts between items needed frequently in the same phase of program 
execution. This paper presents a new technique for reducing direct-mapped cache misses 
caused by conflicts for a particular cache line. A small fi ... 
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Accurate instruction fetch and branch prediction is increasingly important on today's wide- 
issue architectures. Fetch prediction is the process of determining the next instruction to 
request from the memory subsystem. Branch prediction is the process of predicting the 
likely out-come of branch instructions. Several researchers have proposed very effective 
fetch and branch prediction mechanisms including branch target buffers (BTB) that store the 
target addresses of taken branches. An alternative ... 
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Sandeep Sen, Siddhartha Chatterjee, Neeraj Dumir 
November 2002 Journal of the ACM ( JACM), volume 49 issue 6 
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We present a model that enables us to analyze the running time of an algorithm on a 
computer with a memory hierarchy with limited associativity, in terms of various cache 
parameters. Our cache model, an extension of Aggarwal and Vitter's I/O model, enables us 
to establish useful relationships between the cache complexity and the I/O complexity of 
computations. As a corollary, we obtain cache-efficient algorithms in the single-level cache 
model for fundamental problems like sorting, FFT, and an i ... 
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Information integrity in cache memories is a fundamental requirement for dependable 
computing. Conventional architectures for enhancing cache reliability using check codes 
make it difficult to trade between the level of data integrity and the chip area requirement. 
We focus on transient fault tolerance in primary cache memories and develop new 
architectural solutions, to maximize fault coverage when the budgeted silicon area is not 
sufficient for the conventional configuration of an error checki ... 
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The difference-bit cache is a two-way set-associative cache with an access time that is 
smaller than that of a conventional one and close or equal to that of a direct-mapped cache. 
This is achieved by noticing that the two tags for a set have to differ at least by one bit and 
by using this bit to select the way. In contrast with previous approaches that predict the 
way and have two types of hits (primary of one cycle and secondary of two to four cycles), 
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all hits of the difference-bit cache are ... 
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This paper introduces an innovative cache design for vector computers, called prime- 
mapped cache. By utilizing the special properties of a Mersenne prime, the new design does 
not increase the critical path length of a processor, nor does it increase the cache access 
time as compared to a direct-mapped cache. The prime-mapped cache minimizes cache 
miss ratio caused by line interferences that have been shown to be critical for numerical 
applications by previous investigators. We show that sig ... 
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This paper presents a technique for minimizing chip-area cost of implementing an on-chip 
cache memory of microprocessors. The main idea of the technique Caching Address Tags, or 
CAT cache for short. The CAT cache exploits locality property that exists among addresses of 
memory references for the purpose of minimizing chip area-cost of address tags. By keeping 
only a limited number of distinct tags of cached data rather than having as many tags as 
cache lines, the CAT ... 
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Cache storage is a proven memory speedup technique in large mainframe computers. Two 
of the main difficulties associated with the use of this concept in small machines are the high 
relative cost and complexity of the cache controller. An LSI bit-slice chip set is described 
which should reduce both controller cost and complexity. The set will enable a memory 
designer to construct a wide variety of cache structures with a minimum number of 
components and interconnections. Design parameters ar ... 
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19 Runtime identification of cache conflict misses: The adaptive miss buffer 
Jamison D. Collins, Dean M. Tullsen 

November 2001 ACM Transactions on Computer Systems (TOCS), Volume 19 issue 4 
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This paper describes the miss classification table, a simple mechanism that enables the 
processor or memory controller to identify each cache miss as either a conflict miss or a 
capacity (non-conflict) miss. The miss classification table works by storing part of the tag of 
the most recently evicted line of a cache set. If the next miss to that cache set has a 
matching tag, it is identified as a conflict miss. This technique correctly identifies 88&percnt; 
of misses. Several applications of this i ... 

Keywords: Cache architecture, adaptive miss buffer, cache exclusion, conflict misses, 
prefetching, victim cache 
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Recent work has shown that a small number of distinct frequently occurring values often 
account for a large portion of memory accesses. In this paper we demonstrate how this 
frequent value phenomenon can be exploited in designing a cache that trades off 
performance with energy efficiency. We propose the design of the Frequent Value Cache 
(FVC) in which storing a frequent value requires few bits as they are stored in encoded form 
while all other values are stored in unencoded form using 32 bits. ... 
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